HW 1

Dingrui Lei

The pdf is made from jupyter notebook

1 Backpropagation in a Simple Neural Network

a) Dataset

b) Activation Function

See source code in three layer_neural_network.py. image.png

c) Build the Neural Network

See source code in three layer_neural_network.py. image.png

d) Backward Pass - Backpropagation

See source code in three layer_neural_network.py. image.png

e) Train network with different activation functions

Differences that I observe:

Boundary generated by Tanh is more smooth.

Boundary generated by Sigmoid is more smooth.

Boundary generated by ReLu is more sharp.

Model with Tanh

Model with Sigmoid

Model with ReLU

Model with Tanh, adding more units

Differences that I observe:

It seems that overfit appears.

Boundary generated by tanh with more hidden units is more wiggly.

f) Training a Deeper Network!!!

See source code in three n_layer_neural_network.py.

First layer with 1 hidden neuron. Second layer with 3 hidden neuron. It performs normally.

First layer with 3 hidden neuron. Second layer with 5 hidden neuron. The overfit appears with werid boudary and noisy nodes.

First layer with 6 hidden neuron. Second layer with 10 hidden neuron. The result is overfitted severely.

Import new tesing dataset

First layer with 1 hidden neuron. Second layer with 3 hidden neuron. The result is bad.

First layer with 3 hidden neuron. Second layer with 5 hidden neuron. The result is bad still.

First layer with 5 hidden neuron. Second layer with 10 hidden neuron. The classification is close to good.

2 Training a Simple Deep Convolutional Network on MNIST

a) Build and Train a 4-layer DCN

See source code in three dcn_mnist.py.

Run Training

Visualize Training

image.png

b) Build and Train a 4-layer DCN

image-3.png image-6.png image-7.png

c) Time for More Fun!!!

See modified code in modified modified_dcn_mnist.py

What I observe:

I use substitute ReLU with tanh for activation and initialize the weights and biases by Xavier. From the original histogram generated by tensorboard, It can be clearly seen that most of the neurons in the network are dead with zero activation. During backpropagation, those dead neurons don't let the gradient flow and can cause slower learning. After using ReLU and Xavier, as expected, the distribution of neuron activations more closely resembles a normal distribution, and most neurons in the network have activations. The neural network also learns slightly faster, as shown in the test accuracy graph.

image.png image-2.png image-3.png